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Abstract 

UPGMA is a heuristic method identifying the least squares equidis- 
tant phylogenetic tree given empirical distance data among n taxa. 
We study this classic algorithm using the geometry of the space of all 
equidistant trees with n leaves, also known as the Bergman complex 
of the graphical matroid for the complete graph K n . We show that 
UPGMA performs an orthogonal projection of the data onto a maxi- 
mal cell of the Bergman complex. We also show that the equidistant 
tree with the least (Euclidean) distance from the data is obtained from 
such an orthogonal projection, but not necessarily given by UPGMA. 
Using this geometric information we give an extension of the UPGMA 
algorithm. We also present a branch and bound method for finding 
the best equidistant tree. Finally, we prove that there are distance 
data among n taxa which project to at least (n — 1)! equidistant trees. 

1 Introduction 

We study the problem of finding the least squares equidistant tree given 
distance data between the elements of a finite set X of cardinality n. The 
set X is often a collection of taxa in biological applications. In this paper we 
will usually let X — {1, 2, . . . , n} unless otherwise stated. The distance data 
is given by a dissimilarity map, a real-valued function d : ('"') — ► K. denned 
for pairs where 1 < i < j < n. We will represent a dissimilarity map 
by the edge weights of the complete graph K n of n vertices where the weight 
of the edge in K n is d(i,j). 

Let T be a (not necessarily binary) weighted tree with a root r and n leaves. 
The weight of each edge e G T will be denoted by wx(e), and we will omit the 



subscript when the context makes it clear which tree we refer to. Given such 
a tree we get a distance function x(a, b) = ^2 eeP i«r(e) where P a ,b is the 
unique path between the nodes a and b in T. A tree is called equidistant if 
x(i, r) is the same real number for each leaf i — 1, . . . , n. Note that we label 
the leaves of T by X. A tree is equidistant if and only if for each distinct 
i, j, k e X, the set of distances {x(i, j), x(i, k),x(j, k)} achieves its maximum 
at least twice [U [19], [22] . These are the ultrametric conditions. 

With these definitions we can present the main problem of this paper: given 
a dissimilarity map d on X = {1, . . . ,n} find an equidistant tree T on n 
leaves such that 

l<i<j<n 

is minimized. It is known that this problem (as well as the unrooted nonequidis- 
tant version) is NP complete [T3j fl"^ [ [To]. 

The problem of tree construction arises in biology, where the goal is to de- 
scribe the evolutionary history of species or genes. An equidistant tree ap- 
proximates the true evolutionary history. The distances between species 
may be measured using several different methods, but currently distances 
are most often determined by comparison of aligned nucleic acid or amino 
acid sequences. One of several models of evolution is used to correct for the 
possibility of multiple substitutions at any one site [10j. When the rate of 
nucleotide or amino acid substitution was constant over the time period be- 
ing considered, the ultrametric conditions are close to being satisfied. This 
condition is the molecular clock hypothesis, and if it holds a least squares 
equidistant tree could be used to fit the distance data. Least squares methods 
for tree construction are attractive because they are statistically consistent: 
the correct tree will be identified in the limit as the length of the sequences 
grows [HI [ID] . In many cases the molecular clock hypothesis is not satisfied, 
and trees that are additive but not equidistant are preferred. 
The Unweighted Pair Group Method with Arithmetic Means (UPGMA) algo- 
rithm is a heuristic method for finding the least squares equidistant tree [10] . 
The UPGMA algorithm has polynomial time complexity, and works well on 
data which shows clock-like behavior. Even if the molecular clock holds, 
however, the UPGMA algorithm may return a tree that is not the best by 
the least squares criterion, as shown in Example 2.5 below. The unweighted 
least squares approach was first suggested by Cavalli-Sforza and Edwards 
[SJ. Other, related algorithms include the pioneering weighted least squares 
algorithm of Fitch and Margoliash [TT], the transformed distances method 
[8], and neighbor-joining [18]. Neighbor-joining and recent variants BIONJ 
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[T2] and weighbor [3] are not strictly least squares algorithms. Of all these 
UPGMA is particularly interesting here, as it arises naturally as a greedy 
algorithm from the approach described below. When the Euclidean metric 
is replaced by metric, a fast exact algorithm is known [6]. A conceptual 
explanation of this algorithm is given in pQ . 

Here we first describe UPGMA. We will present a version that outputs 
the combinatorial description of T and x(i,j) for each pair of leaves i and 
j. It is well-known how to compute the edge weights wr(e) from these data. 
Recall that we represent the dissimilarity map d as the edge weights of K n . 

Algorithm 1.1. UPGMA 

Input : Complete graph K n with edge weights d{i,j). 

Output: An equidistant tree T with leaves X = {1, . . . ,n} and x(i,j) for 

each i, j G X. 



G := K n . 

V{T) := X, E{T) := 0, and S(T) := X. 
repeat 

minave := min — d(i, j) 

v,w£V(G) C(V,W) ^ K ' 

where E{y , w) is the set of edges between the nodes v and w in G, and 
C(v,w) = \E(v,w)\. 

Let s and t in V(G) be the vertices for which the minimum above is attained. 
Set x(i,j) := minave for all G E(s,t). 

G := G/{s,t}, obtained by contracting the vertices s and t into a single 

vertex st := s U t. 

S(T):=S(T)\{s,t}U{st}. 

V(T) := V(T) U {st}, E(T) := E(T) U {st, s} U {st, t}. 
until G has one vertex 



Output T and x(i,j) 1 < i < j < n. □ 

In Section [2] we describe the Bergman complex B n , namely, the space of 
all equidistant trees on n leaves. We prove that Algorithm 1.1 performs an 
orthogonal projection (with respect to the usual Euclidean inner product) 
onto a maximal cell of B n . We give an example where already for n = 4 
the UPGMA tree can be arbitrarily worse than the best equidistant tree. In 
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Section [3] we prove that the best equidistant tree is obtained by an orthog- 
onal projection onto some maximal cone of B n . Motivated by this result we 
introduce a polyhedral subdivision of the data space r( 2 ) where each maxi- 
mal cell consists of data vectors which project onto the same set of maximal 
Bergman cells. We show that the collection of such Bergman cells could be 
disconnected (in a sense made precise in Section [3]). In fact, there are data 

vectors in IrW which project onto at least (n — 1)! Bergman cells, and we 
conjecture that this is the most number of projections one can obtain. Fur- 
thermore, we classify all data vectors in IR 6 which project onto six Bergman 
cells. In Section [4] we introduce two algorithms based on our results in Sec- 
tion [3j One of them is an extension of UPGMA that finds at least as good 
a tree as the UPGMA tree. The other one finds the best equidistant tree 
using a branch and bound approach. Section [5] concludes with an example 
where we analyze data for the timing and the sequence of the appearance of 
mammalian orders. 



2 The Bergman complex and UPGMA 



It is not difficult to show that Algorithm 1.1 indeed returns an equidistant 
tree using the ultrametric characterization of equidistant trees. Here we will 
describe the space of all vectors x = (x(i,j) : 1 < i < j < n) e Ir( 2 ) which 
come from weighted equidistant trees with n leaves, and from this description 
it will follow that the UPGMA produces an equidistant tree. Ardila and 
Klivans [2] described this space as a special case of the tropicalization of 
a linear variety, or more combinatorially, as the Bergman complex B n of 
the graphical matroid of K n . This description shows that B n C is a 
polyhedral complex of dimension n — 1: its maximal cones are polyhedral 
cones of dimension n — and any collection of them intersects in a face that 
belongs to each cone in the collection. 

We first describe a different polyhedral complex T n of dimension n — 1 that is 
a refinement of B n , i.e. the maximal cones of T n further subdivide the ones in 
B n . Given a graph G 01a m < n vertices which are labeled by disjoint subsets 
of [n], and two vertices labeled s and t we obtain G/{s,t}, the contraction 
of G on {s, t}, where 

V{G/{s,t}) = V(G)\{s,t}\J{st} and E(G/{s, t}) = E(G) \ E(s, t) 

where E(s,t) is the set of edges between the vertices s and t. We label the 
vertices K n with the singletons {1}, . . . , {n}, and we call a graph G obtained 
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Figure 1: Lattice of contractions of K4 

by a sequence of contractions from K n a contraction of K n . Contractions 
of K n form a lattice where H > G if H can be obtained by a sequence of 
contractions from G. This lattice is isomorphic to the partition lattice Il n 
which is in turn isomorphic to the lattice of flats of K n ordered by inclusion: 
a flat of K n is the set of edges that are not present in a contraction of K n . 
Figure [T] illustrates the lattice of contractions of K4. 

Now let T = {0 = F C F 1 C F 2 C ••• C F n _ 2 C F n _ x = ( [ J)} be 
a maximal chain of flats of K n obtained from K n by a sequence of n — 1 
contractions to K\ with the vertex label [n]. Note that Fi \ F^i is E(s, t) for 
the corresponding contraction. We define a cone that is associated to T as 

G T = {(x(t,j))ER^ : 

{x(k, I) = x(s, t) : (k, I), (a, t) e \ F } < 

= x(s, t) : (A;, I), (s, t)eF 2 \F 1 } <■■■< 

{x(k,l)=x(s,t) : (M),(M) efn-i\^ 2 }}. 
The set of Cjc as ranges over all maximal chains in U n is the maximal 
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cones of T n . As we mentioned above the maximal cones of the Bergman 
complex B n are refined by the cones in T n - Indeed, two cones C^i and C^2 
belong to the same maximal cone in B n if the chain of flats JF 1 and T 2 differ 
exactly in one flat, say F} ^ Ff, and {F} \ F}_^) n (Ff \ Ff_ x ) = 0. 



Example 2.1. There are two types of cones corresponding to two types of 
flag of flats in K 4 , namely, 

T = {0 C {(1, 2)} C {(1, 2), (1, 3), (2, 3)} C (^fj } and 

r = {0c{(l,2)}c{(l,2),(3,4)}c ^}. 

These go with two types of trees on four leaves: the comb and the fork in 
Figure [2} The corresponding maximal cones in JF 4 are: 



C T = G R b : x(l,2) < x(l,3) = £(2,3) < x(l,4) = x(2,4) = x(3,4) 

C T , = \{x{i,j)) el 6 : x(l,2) <x(3,4) <x(l,3) =x(l,4) =x(2,3) =x(2,4) 



Proposition 2.2. The UPGMA algorithm produces an equidistant tree. 

Proof. It is clear that this algorithm performs a sequence of contractions 
starting from K n and ending in K\. At iteration i of the repeat loop we let 
Fi \ to be E(s,t) that has been identified. The algorithm sets x(i,j) = 
x(k,l) for all (i,j),(k,l) G E(s,t). So we just need to show that x(a,b) < 
x(c, d) for (a, b) G Fi \ and (c, d) G F i+ i \ Fi. We let the edges identified 
in the (i + l)st loop to be E(v, w). There are two cases: either one of v or w 
is s U t or not. In the second case 

X ^ h) = ch) ^ d(hj) - W^) S d(i,j) = x(c,d). 
For the first case, without loss of generality we assume v = s U t and 

V ^ (i,j)EE{ S ,t) V ' (»J)eJS(«,tB) V ' (i,j)GE(t,w) 
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comb fork 



Figure 2: Comb and fork trees on four leaves 
Now since J2(i,j)eE(v,w) d (hj) is e( l ual to 

C(s,w) ( 1 ^ .,. .. ) C(t,w) ( 1 ^ .,. ,) 
V ' ; \ V ' ; (ij)eE(,,v,) j v ' ; \ v ' ; (i,j)eE(t,w) ) 

and C(f , w) = C(s, w) + C(t, w) we get the desired inequality in this 

well. □ 

In the rest of the paper, we will denote the cone in T n which the UPGMA 
identifies as Cjjpgma- 

Proposition 2.3. If (x(i,j)) is the vector that the UPGMA outputs on 
the input of the vector (d(i,j)) then (x(i,j)) is the orthogonal projection 
of(d(i,j)) onto Cu pgm a- 

Proof. Let T = {0 = F C Fi C F 2 C • • • C F n _ 2 C = [n]} be the 

chain of flats that define the cone Cjjpgma- Let Ljjpgma be the smallest 
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subspace containing Cupgma- This subspace is denned by 

Lupgma = {(x(i,j)) G : 

x(k, I) = x(s, t) V (k, I), (s, t) G F 1 \ F and 
x(k, I) = x(s, t) V (k, I), (s, t) G F 2 \ F 1 and 

x(k,l)=x(s,t) V (k,l),(s,t) G F n _!\F n _ 2 }, 
and it has an orthonormal basis consisting of the set of vectors 

ivmm - eM) : < = i --»- 1 } 



where e(s,t) G is the standard unit vector corresponding to the edge 
(s, t) of K n . The linear projection formula with respect to this orthonormal 
basis implies that x(v,w) coordinate of the projection of (d(i,j)) is equal to 

\Fk \ F k _i\ ^-f 

with (f,u>) G -Ffc \ Note that if (i>, iu) belongs to the contracted edges 

E(s,t) during the UPGMA that produced then E(s,t) = F k \ F fc _i and 
C(s,t) = \F k \ F k -i\i and therefore the projected vector (x(i,j)) is precisely 
the vector generated by UPGMA. Therefore this projected vector is not only 
in Lupgma but also in Cupgma- This shows that UPGMA performs an 
orthogonal projection of (d(i,j)) onto Cupgma- □ 

Corollary 2.4. When n = 3 UPGMA produces the least squares tree. 

Proof. We assume that d(l,2) < d(l,3) < d(2,3). The two fans B 3 and JF 3 
are identical with three cones described by the three chains of flats 

f 12 = {0c{(l,2)}c^)}, ^3 = {ic{(l,3)}c^)}, ^3 = {0C{(2,3)}C^]}. 

UPGMA produces the tree in Cj? 12 where the leaves labeled with 1 and 2 
form a cherry, and 

x(l,2) = d(l,2) and 3) = x(2, 3) = (d(l, 3) + d(2, 3))/2. 

The square distance of this tree to the data point is (d(2, 3) — rf(l,3)) 2 /2. 
We can orthogonally project the data point onto Ljr 13 and Ljc 23 to obtain 
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x(l,3) = d(l,3), x(l,2) = x(2,3) = (d(l,2) + d(2,3))/2 and x(2,3) = 
d(2,3), x(l,2) = z(l,3) = (d(l,2) + d(l,3))/2, respectively. The first pro- 
jection is in Cj^ 13 if and only if d(l, 3) < (d(l, 2) + d(2, 3))/2, and the second 



projection is never in Cjr 23 unless d(l, 2) = d(l,3) = d(2,3). Theorem 3.4 
implies that the best tree is either the UPGMA tree or the tree obtained from 
Cjc 13 if the projection falls into this cone. Since the square distance from the 
data point to this projection is (d(2, 3) - d(l,2)) 2 /2 > (d(2, 3) - d(l,3)) 2 /2 
we get the result. □ 

Example 2.5. When n = 4 UPGMA tree may be arbitrarily worse than 
the least squares tree. Let the data be (d(i,j)) = (d(l, 2), . . . , d(3, 4)) = 
(1, 2, 20, 10, 28 + e, 5). The UPGMA tree is obtained by contracting the edge 
(1,2) and then (3,4) in K 4 . This gives us (x(i,j)) = (x(l, 2), . . . , x(3, 4)) = 
(1, 15+ ^e, 15+ |e, 15+4e, 15+|e, 5), and the square distance from (d(i,j)) to 
(x(i,j)) is 388 + 31e+ |e 2 . The data point can also be orthogonally projected 
onto the cone C T where T = {0 C F t C F 2 C F 3 = ( [ J)} with F t = {(1, 2)}, 
F 2 \ F 1 = {(1, 3), (2, 3)}, and F 3 \F 2 = {(1, 4), (2, 4), (3, 4)}. The resulting 
point is (y(i,j)) = (1, 6, 6, y + |e, y + |e, y + |e), and the square distance 
from (d(z, j)) to (y(i,j)) is ^ + y^e + |e 2 . The first expression is greater 
than the second one for any e > 0. Indeed, the difference is y^ + ye + ^e 2 , 
and this shows that the UPGMA tree could be arbitrarily bad. 



3 The geometry of projections 

In the preceding section we showed that UPGMA performs an orthogonal 
projection of (d(i,j)) onto a distinguished cone of the complex T n - It is not 
immediately clear whether the least squares equidistant tree is obtained by 
projecting (d(i,j)) orthogonally onto some cone of T n . Such a tree will be 
obtained by locating a point on T n (a polyhedral complex) that is closest to 
(d(i, j)), and in general, nearest point maps of polyhedral complexes do not 
have to be given by orthogonal projections onto the maximal faces: take for 
instance the polyhedral complex in M 2 whose maximal faces are the nonnega- 
tive x-axis together with the nonnegative ?/-axis. For any point with negative 
coordinates the nearest point is the origin. Although this is obtained by an 
orthogonal projection onto the origin, these projections are not orthogonal 
to the maximal faces. In this section, we first show that for T n and the 
Bergman complex B n the unexpected happens. 

We start with a definition. Given a maximal chain of flats T of K n as in 
Section!^ we let Pjr to be the set of points in IrW that orthogonally projects 
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to some point in Cjt. Since Pjr = C^ + Lp where is the smallest subspace 
containing CV, it is clear that P? is also a polyhedral cone. We call this cone 
the projection cone of C?. 

Theorem 3.1. The projection cone Pp is the full- dimensional cone defined 
by the n — 2 inequalities 

£ X{h3) ~ \F k+ \ Fk \ S X M) 

where k = 1, ...,n — 2. The common refinement of P?r over all T is a 
complete polyhedral fan. 

Proof. Let Kj? be the cone defined by the above inequalities. The proof 



of Proposition |2.3| implies that any point in Kjr projects to a point in C?: 
one should only note that if (x(i,j)) satisfies the inequalities then (x(i,j)) + 
(p(i,j)) also satisfies them for any (p(i,j)) in since vectors in do not 
change the averages which are on both sides of these inequalities. Conversely, 
any point in Pjr is of the form (y(i,j)) + (p(i,j)) where (y(i,j)) E which 
trivially satisfies these inequalities. The intersection of any collection of Pjr is 
a nonempty cone since the intersection of all P? contains the line generated 



by (1, 1, ... , 1) (in fact, it is equal to this line). Moreover, Proposition 2.3 

implies that every point in IrW is in some Pjjpgma- This shows that the 
common refinement of Pjc- is a complete polyhedral fan. □ 

At the end of this section we will look more carefully at this polyhedral 
complex obtained by superimposing all Py?. For our main result we need two 
technical lemmas. 

Lemma 3.2. Suppose JF 1 and T 2 are two distinct maximal chains of flats of 
K n . Then the interior of P?\ and the cone Cj^. are disjoint. 

Proof. Suppose P = {0 = F% C F( C • • • C F J n _ 2 C F 3 n _ x = ( [ J)} for 
j = 1,2. Note that the relative interior of Pjn is defined by the inequalities 
defining P T \ except that < are replaced by <. We suppose that the intersec- 
tion of the interior of Pj?i and the cone Cjc-2 is not empty, and we will reach 
a contradiction. Assume that F^ = F 2 for p < q and ^ F 2 . Let (x(i,j)) 
be a point in this nonempty intersection, and let (y(i,j)) be the projection 
of this point onto Cjc-i. Let (s,t) be an edge in F^ \ F^_ 1 . Then we know 
that y(i,j) = a for all E F^ \ and therefore y(s,t) = a. The edge 
(s, t) is in F 2 \ F 2 _ x where r > q since otherwise F^ = F 2 . Since F 2 is a flat 
containing F^_ x = the general theory of matroids implies that F 2 \ F 2 _ x 
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contains F q \ F q _ v Because (x(i, j)) is in we conclude that x(i,j) = b 
for all G F T 2 \F r 2 _ x and therefore x(i,j) = b for all G F^\F^_ V The 
orthogonal projection onto Cjc-i keeps the average of these x(i,j) constant. 
In other words, a = b and x(i,j) = y(i,j) for (i, j) G F q \ F q _ v Now let 
(u, v ) be an edge in F q l +l \ F q . Again we know that y(i,j) = c > a for all 
(i, j) G Fq + i\Fq, including y(u, v) = c. We will show that (u, v) is in F^\F^_ t 
where z > r. Suppose not. Then F 2 \ F 2 x contains F q+1 \ F q _ 1: and this 
implies that x(i,j) = b it j < a for all G F q 1 +1 \F q 1 _ 1 . But then the orthog- 
onal projection argument implies that y(i,j) = c < a for (i, j) G F q+1 \ F q . 
This is a contradiction and we conclude that z > r. The above chain of 
arguments can be applied to all F£ \ for k = q, . . . , n — 1 to produce a 
chain F r 2 C F r 2 +1 C • • • C F 2 ^ where q < r q < r q+ \ < ■ ■ ■ < r„„i < n — 1 
(we have constructed the first two members of this chain, namely r q = r and 
r q+ i = z). However, this is a contradiction since there are only n — 1 — q 
distinct integers bigger than q and at most n — 1. □ 

Lemma 3.3. Suppose T 1 and T 2 are two distinct maximal chains of flats 
in K n . If P r i n Cjc-2 is nonempty, then this intersection is contained in 



Proof. This proof invokes similar ideas as in the proof of Lemma 3J2 Suppose 
p = {0 = F° G C F{ C ••• C F 3 n _ 2 C = ( [ J)} for j = 1,2. Assume 

that Fp 1 = F p 2 for p < q and F^ 1 7^ F 2 . Let (x(i,j)) be a point in P^i n C>2, 
and let (y(i,j)) be the projection of this point onto C^i. We will show 
that x(i,j) = y(i,j) for all (i, j). By our assumption this is true for all 
(i,j) G F q 1 _ 1 = -F 2 _ 1 . Let (s,t) be an edge in F q \ F q 1 _ 1 . Then we know 
that y(i,j) = a for all (i,j) G F 1 \ and therefore y(s,t) = a. The edge 
(s,t) is in F 2 \ F 2 _ x where r > q since otherwise F^ 1 = F 2 . Since F 2 is a 
flat containing F 2 ^ = F q _ x we conclude that F 2 \ F 2 ^ contains F^ 1 \ Fi^. 
Because j)) is in C^2 we have x(i,j) = b for all (i, j) G F 2 \ F 2 _ x and 
therefore x(i,j) = b for all G F 1 \ Fi_ x . The orthogonal projection 

onto Cjri keeps the average of these x(i,j) constant. In other words, a = b 
and x(i,j) = y(i,j) for (i,j) G F 1 \ F 1 _ 1 . Now let (u,v) be an edge in 
F q+1 \ F q . Now we know that y(i,j) = c > a for all (2, j) G F (? 1 +1 \ Fj, 
including y(u, v) = c. Furthermore (u, v) is in F 2 \ F z 1 _ 1 where 2; > q — 1. If 
z < r then F x +1 \ F q _ 1 is contained in F 2 , and therefore x(i,j) = fejj < a for 
all (i, j) G F (? 1 +1 \ F 1 . If x(i,j) < a for any of these edges, then the average 
of x(i,j) in this set is strictly less than a. But this average is equal to the 
average of y(i, j) for (i,j) G F q+1 \ F q , and we get a contradiction since this 
implies c < a. Therefore, if z < r then a = c and x(i,j) = y{i,j) for all (i, j) 
in F^yFg 1 . If z > r, then F 2 \F 2 2 _ 1 contains F^yFg 1 . Therefore x(i,j) = b 
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for all G ^l+i \ Fq, and t ne average of these x(i,j) is b. But b = c since 
the average of x(i,j) and y(i,j) is constant for (i, j) G \ F 1 . But this 
implies x(i,j) = y(i,j) for edges in this set as well. Now we can repeat the 
same argument for the rest of F£ \ Fl_ v 

□ 

Theorem 3.4. Let Vd be the set of projection cones containing the data point 
(d(i,j)). Then the best least squares equidistant tree corresponds to a point 
in Cy for some P? G Vd- 

Proof. Let x = (x(i,j)) be the point corresponding to the best least square 
equidistant tree, and let C be the cone of T n where x G C. We let P 
be the projection cone of C . If the line segment x — d is orthogonal to C, 
then P G Vd and we are done. If not, we show that x is on the boundary 
of C (and hence P). Suppose that x is in the relative interior of C, and 
hence in the interior of P. If x — d is entirely contained in P, then it must be 
perpendicular to C and this would imply that P G Vd- Hence let y = (y(i,j)) 
be the first point on x — d which intersects P. Clearly, y must be on the 
boundary of P. If we let y' be the orthogonal projection of y onto C, then 
we conclude that \\y' — d\\ < \\y — d\\ + \\y'— y\\ < \\y — d\\ + \\x — y\\ = \\x — d\\, 
and this is a contradiction. Hence x is on the boundary of C and P. Now 
let V x be the projection cones containing x. Since x is on the boundary of 



P G V x , and by Lemma 3^2 none of the projection cones in V x can contain 
x in their interiors. If we let V x = {-fVu • • • j-fV*.}) then by Lemma 3.3 x 



is contained in (the boundaries of) Cjt i: . . . , Cp k . Now only two things can 
happen. Either x — d is entirely contained in one of P^ where % = 1, . . . , k, 
in which case this cone also belongs to Vd, and we are done, or otherwise for 
each Pj: i there is a first point yi on x — d intersecting Pp. . Repeating the 
above argument we can conclude that y^ = x for all i. This means that for 
every point y on x — d V y is disjoint from V x . But since the projection cones 
are closed cones this is a contradiction, unless x = d. □ 



In the light of Theorem 3.4 we introduce a graph Q n ^d associated to each data 
point d = (d(i,j)) in IR^). The vertices of this graph are Cjr where P?r £ Vd, 
and there is an edge between two vertices Cjri and Cjr2 if these two cones in 
T n share a facet. 

Proposition 3.5. The graph Qs,d is either K 1} K 2 or K 3 , and when n > A, 
Qn t d could have more than one component. 

Proof. At most two out of the three inequalities d\ 2 < [d\^ + d 2 3)/2, d\z < 
(di2 + g? 23 )/2, and d 2 3 < (d\ 2 + c?i 3 )/2 hold unless d\ 2 = dis = d 2 ^. In the 
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latter case Qd = 
ties are satisfied, 
disconnected: 



K 3 , and otherwise Qd = Kj if j = 1,2 of the inequali- 
We use the following data to illustrate that Q^d can be 



(d(l,2),d(l,3),rf(l,4),rf(2,3),rf(2,4),rf(3,4)) = (1,2,3,2,7,3). 

The set Vd consists of four cones, and hence there are four trees one can 
obtain. The component of the UPGMA tree has a total of two trees, and 
there are two more components where each component is just one tree. □ 

We finish this section by studying the polyhedral complex we defined in 



Theorem |3.1[ namely the common refinement of the projection cones for 
each maximal cell Cjr in the Bergman complex B n . We denote this complex 

by Q n . Note that Q n is full dimensional complex in Ir( 2 ), and the interior of 
a full dimensional cell in Q n consists of those data vectors which project to 
the same set of Cy. 

Example 3.6. When n = 3 the complex Q 3 is easy to describe. There are 
total of six maximal cells which are of two different types. The first type 
consists of those vectors which project to exactly one of the three Cp. The 
second type consists of those vectors which project to exactly two Cy. 

Example 3.7. When n = 4 at most six distinct projection cones could have 
an intersection that gives a maximal cell in Q4 as we checked with a short 
MAPLE program. There are a total of 166 such cells, but they come in ten 
different orbits with respect to the action of 5*4. The following table lists a 
representative of each orbit. Since the projection cones are indexed by binary 
trees on four leaves, we just list these trees. 



orbit representative 


orbit size 


(((1,2) 


3.4 1,2 


4) 


3 ( 


((1,3), 2) 


4 


((1,3) 
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3)( 
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24 
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((1,4) 
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3 ( 
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(2,3)) 
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((1,4) 
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((2,4), 1), 3) ( 


(1,3) 
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((2,3) 
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(1,3) 
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3 ,4 ((1,2) 
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3 ( 


((1,3), 2) 


4 


((2,4) 


,1) 


3 ( 
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(1,4) 
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4)( 


((1,4), 2) 
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((2,4) 
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6 



Theorem 3.8. There is a maximal cell in Q n which is the intersection of 
at least (n — 1)! projection cones; i.e., there are data vectors in Ir( 2 ) which 
orthogonally project onto at least (n—l)\ (non-degenerate) equidistant trees. 
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Proof. Let a < b two real numbers and let x(i,j) G M' 2 ' be the data vec- 
tor where x(l,j) = a for j = 2,...,n and x(i,j) = b for all other com- 
ponents. We claim that this vector is in the interior of the intersection 
of (n — 1)! projection cones corresponding to the comb trees of the form 
(■ • • ((1, a%), 03) • • • ), a n ) where 02, a 3 , . . . , a„ run through all permutations of 
{2, . . . , n}. For any one of these trees our data vector is in the interior of the 
corresponding projection cone if and only if 

a + b a + 26 a + 3b a + (n - 2)b 

a < < < < ■ ■ ■ < — . 

2 3 4 n - 1 

The above inequalities hold for the choice of a and b we made. This proves 
the theorem. □ 

This theorem asserts that there are maximal cells in Q n that are intersections 
of at least (n — 1)! projection cones. We believe that the number of such cones 
cannot exceed (n — 1)!, though we do not have a proof. 

Conjecture 3.9. The maximal cells in Q n are obtained as the intersection 
of at most (n — 1)! projection cones. 



4 Extended UPGMA and Branch-and-Bound 

In view of the results in Section [3] we propose two algorithms. The first one 
is an extension of the usual UPGMA which searches the component of the 
graph Q n ^ to which the UPGMA tree belongs to. Even when this component 
is large this extended UPGMA algorithm performs well and finds the best 
tree in this component. The drawback of this algorithm is that it may not 
produce the best tree. Our second algorithm is an exact algorithm which 
produces the best equidistant tree with a branch and bound approach on the 
space of maximal chains of the lattice of contractions of K n . We will present 
this as a shortest path algorithm on the Hasse diagram of this lattice. Recall 
that this lattice is isomorphic to the partition lattice U n where maximal 
chains are in bijection with the maximal cones in T n . 

Algorithm 4.1. Extended UPGMA 

Input : Complete graph K n with edge weights d(i,j). 

Output: An equidistant tree T with leaves X = {1, . . . ,n} and x(i,j) for 

each i, j G X. 
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Using Algorithm 1.1 find the UPGMA tree Tjjpgma and the corresponding 
cone Cjjpgma in the Bergman compex B n . 

Let Visited := {T UPGMA }, Active := {T UPGMA }, and T best := T UPGMA . 
while Active 7^ do 

Let T G Active and Active := Active \ {T}. 

for each Ct' G B n which shares a facet with Cp do 
if C T > G V d and V G" Visited then 

Active := Active U {T'} and Visited := Visited U {V} 
If J2(d(i,j)-x T ,(i,j)) 2 < E(*-j)-^(y)) 2 thenT 6est := V . 
end if 
end for 
end while 



Output T best and x Tbest (i,j) 1 < i < j < n. 



A few remarks about Algorithm 4.1 are in order: This algorithm searches 



the component of Q n ^ to which Cupgma belongs, and it outputs the best 
equidistant tree in this component. If Q n ^ consists of a single component 
then the algorithm's output is the optimal tree. The search depends on the 
following characterization of Ardila and Klivans [2] when Cpi and Cpi share 
a facet in B n . Finally, checking whether a cone Cp belongs to Vd is trivial by 
Theorem 13.11 

Proposition 4.2. Two maximal cones Cjn andCpi share a facet in B n if and 
only if there exists < j < n — 1 such that = Ff for all i — 0, . . . , n — 1 
except Fj ± Ff and (Fj \F}_ 1 )n {Ff \Fj_ 1 )±$. 

Our exact algorithm is a modified shortest path algorithm performed on the 
Hasse diagram of the partition lattice IT n . We first introduce some notation 
for this algorithm. We will represent this Hasse diagram as a directed graph 
where the nodes are labeled by flats of K n , and the edges are directed from 
the minimum element (corresponding to the empty flat) to the top element 
(corresponding to the flat [n] = {1, . . . , n}). For each node (flat) F we will 
keep track of incoming edges Ip and outgoing edges Op. Each edge is directed 
from a flat Fj to a flat Fi + \ of next rank such that Fi C -Fj+i- Each such 
edge e will have two associated numbers, x(e) and £(e), which will be defined 
throughout the algorithm using the given data (d(i,j)). 
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Algorithm 4.3. Exact least squares 

Input : Complete graph K n with edge weights d(i,j). 

Output: The best least square equidistant tree T with leaves X — {1, . . . , n} 

and x(i,j) for each i,j G X. 

Set Active := {0}, 1% := {g}, x(g) := — oo, and £(g) := 0. 
Set V := Active and A := {}. 
for k — 0, 1, . . . , n — 1 do 
Active fe+1 := {}. 

while Active^ 7^ do 
Let F G Active^ and Active^ := Active^ \ {F}. 
for each e = (F, F') G F do 
Set x(e) := E(ij)eF'\F d (hj) and E := {f e I F : x(e) > 

if F = then x(e) := +00 else h := argmin{£(/) : / G E} end if 
i(e) = 1(h) + w(e) where w(e) = E(i,j)eF'\F( x ( e ) _ d (hj)) 2 
if x(e) < +00 then 

Active fc+ i := Active fc+ i U {F'} and A:= AU {e} 
end if 
end for 
end while 
V := V U Active fe+ i. 
end for 

Find the shortest path P from to [n] in the graph G = (V, A) with edge 
weights w(e) for e G A. 

Output the tree T corresponding to P and xt(j, j) 1 < z < j < n. 

Proof of Correctness: Each path P in G from the empty flat to the full flat 
corresponds to a flag T and hence a cone Cjf. By the construction of G, 
the x(e) for the edges e on such a path give a point in Cjc, and this point 
is the orthogonal projection of (d(i,j)) onto Cp. In other words, a path 
P in G corresponds to a cone Cjc g TV Since ^ egP «)(e) is the Euclidean 
distance from the data point to the projection in Cjr, Theorem |3.4| implies the 
correctness of the algorithm if for each Cjc g Vd there is a path P in G from 
to [n]. We show by induction on i that G contains the edges e, = Fj) 
corresponding to the flag T . It is trivial to check that t\ = (0, Fi) is in G. 
Moreover x(ei) = d(i,j) where F\ = We assume that for k < i — 1 

are in G. Note that each is added to G during the fcth pass of the outermost 
for loop. Now x(ei) = ^.y^ YlfrfieF^Fi-i d (hj)i and because C T G V d we 
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conclude that x(ej) > x(ej_i). Since ej_i G i^-i the set E during the pass 
of the innermost for loop corresponding to is nonempty and hence x{ei) 
stays finite. This means e« is added to G. □ 



For the purposes of the exposition of Algorithm 4.3 we have chosen to 
first construct the graph G in the algorithm and then solve the shortest path 
problem on this graph. In fact one can skip the construction of G if one 
adds a pointer to each edge e that points to the corresponding edge h in the 
algorithm. With these pointers one can reconstruct the shortest path and 
hence the best equidistant tree T at the end of the algorithm. Note that 
this algorithm is a branch and bound algorithm on the space of all maximal 
chains in U n starting from the empty flat: whenever x(e) = +00 for some e 
being considered then all such maximal chains containing e are pruned from 
the branch and bound tree. The branching step is realized when we extend 
a chain terminating at the node labeled F by adding e = (F,F') G Op for 
all such edges where x(e) < +00. 



5 A biology example 

How does the least squares approach compare to Bayesian and maximum 
likelihood methods in practise? We compared the different methods on a 
problem in evolution for which some of the data shows clock-like behavior. 
Murphy et al. (2001) have studied the timing and sequence of appearence of 
the mammalian orders using a large DNA database that includes 42 placental 
mammals from all orders, plus two marsupials as the outgroup. The model of 
sequence evolution employed by Murphy et al. (and by us) was the general- 
time-reversible+r+invariants model. Bayesian and maximum likelihood 
methods converged on the same combinatorial type of tree (Murphy et al., 
2001). Distances estimated during likelihood fitting using this model do 
not satisfy the clock hypothesis over the complete dataset; however a subset 
of eleven species do show clock-like substitution rates (Murphy et al.,2001, 
supplemental material). Distances from ten of these taxa were analyzed here 
using the exact least squares algorithm. The main conclusions of Murphy et 
al. on the branching sequence are supported by the best least squares tree: 
first the Afrotherians, then the Xenartharns and finally the Boreoeutherians 
separate from their placental ancestors. The only difference between the 
ten taxa least squares and maximum likelihood trees is the position of the 
dolphin. Murphy et al. scaled their tree to obtain dates using 50 mya 
for the cat/canid divergence. Scaling the best least squares tree in the 
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Figure 3: Example of mammalian phylogeny obtained by extended UPGMA 

same way gives 107 and 101 million years ago for the bifurcations producing 
the Afrotherians and Xenarthrans, respectively. The corresponding values 
reported by Murphy et al., 2001, are 103 and 95 million years ago. Hence 
the agreement between the least squares and Bayesian or likelihood methods 
is quite good. 

Visual inspection of the complete phylogram from the 44 taxa dataset 
suggested that others were nearly contemporaneous with those in the eleven 
taxa subset. For a sequence of datasets ranging from eleven to nineteen 
species, three trees were identified: we found the maximum likelihood tree, 
the maximum likelihood equidistant tree, and the best least squares equidis- 
tant tree. Species added to the eleven taxa subset were the roussette fruit 
bat, anteater, whale, hippopotamus, aardvark, human, horse and sciurid. 
The inexact form of the least squares equidistant tree algorithm was used 
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on these datasets with more than ten taxa. For the trees with twelve up 
to eighteen taxa the number of possible least squares trees was either one or 
two, with the best least squares tree being the UPGMA tree in each case. 
The eighteen taxa dataset had two possible trees, the better was the non- 
UPGMA tree (Figure 3). When the sciurid data was then added to create 
a dataset with nineteen taxa, the number of possible trees jumped to six. 

The number of possible trees for datasets up to eighteen taxa is small 
compared to the conjectured (n — 1)! upper limit of trees, indicating that 
the distances were close to clock-like. Hence the corresponding equidistant 
trees should be good approximations to the phylogeny. As distances that 
deviate more from clock-like behavior are added, the number of possible trees 
increases and the equidistant tree gives a poorer account of the phylogeny. 

When two least squares equidistant trees were possible for a given dataset, 
the oldest bifurcations were conserved between the two, with the differences 
appearing in more recent branchings. This observation is expected. For a 
given internal node the distance to a leaf is one-half the average of all path 
lengths between pairs of leaves that pass through that node. More paths pass 
through the older nodes, so their ages are estimated more accurately. Unless 
old bifurcations occur very close to each other, they will be more stable in 
the set of possible trees. The best least squares trees with up to eighteen 
taxa all confirmed the branching order of the Afrotherians, Xenarthrans and 
Boreoeutherians observed by Murphy et al., 2001. 

There was one persistent difference between the likelihood and least squares 
approaches: the equidistant least squares trees placed the cetartio dactyls as 
an outgroup to the carnivores, bats and pangolin, whereas the maximum 
likelihood trees put the bats as an outgroup. It is a bit surprising, since the 
likelihood and distance methods are both consistent in the statistical sense, 
and therefore expected to converge on the same, correct, tree (Felsenstein, 
2004). The dataset contains 17028 characters, but perhaps more data is 
needed, or a different sample of sequences, to obtain convergence on one 
tree. 
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